Search CORE

331 research outputs found

Cost estimation of spatial join in spatialhadoop

Author: Belussi A.
Eldawy A.
Migliorini S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Spatial join is an important operation in geo-spatial applications, since it is frequently used for performing data analysis involving geographical information. Many efforts have been done in the past decades in order to provide efficient algorithms for spatial join and this becomes particularly important as the amount of spatial data to be processed increases. In recent years, the MapReduce approach has become a de-facto standard for processing large amount of data (big-data) and some attempts have been made for extending existing frameworks for the processing of spatial data. In this context, several different MapReduce implementations of spatial join have been defined which mainly differ in the use of a spatial index and in the way this index is built and used. In general, none of these algorithms can be considered better than the others, but the choice might depend on the characteristics of the involved datasets. The aim of this work is to deeply analyse them and define a cost model for ranking them based on the characteristics of the dataset at hand (i.e., selectivity or spatial properties). This cost model has been extensively tested w.r.t. a set of synthetic datasets in order to prove its effectiveness

Catalogo dei prodotti della ricerca

Idmb: a tool for navigating the Inspire data model and generating an Inspire SQL database and WFS Configuration

Author: Belussi A.
Negri M.
Pelagatti G.
Publication venue
Publication date: 01/01/2014
Field of study

The Inspire Data Model Browser (IDMB) is a free tool that performs the following functions: (i) it presents the Inspire UML Data Model as a tree-based structure, which is complementary to the UML diagrams; (ii) it generates a Postgis SQL Script for creating an INSPIRE compliant SQL database (Inspire Database) and a configuration file for the Deegree tool that enables the access to the Inspire Database through a Web Feature Service (WFS) producing GML according to the Inspire XML Schemas

Archivio istituzionale della ricerca - Politecnico di Milano

Catalogo dei prodotti della ricerca

A framework for evaluating 3D topological relations based on a vector data model

Author: Belussi A
Migliorini S
Negri M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

3D topological relations are commonly used for testing or imposing the existence of desired properties between objects of a dataset, such as a city model. Currently available GIS systems usually provide a limited 3D support which usually includes a set of 3D spatial data types together with few operations and predicates, while limited or no support is generally provided for 3D topological relations. Therefore, an important problem to face is how such relations can be actually implemented by using the constructs already provided by the available systems. In this paper, we introduce a generic 3D vector model which includes an abstract and formal description of the 3D spatial data types and of the related basic operations and predicates that are commonly provided by GIS systems. Based on this model, we formally demonstrate how these limited sets of operations and predicates can be combined with 2D topological relations for implementing 3D topological relations

Archivio istituzionale della ricerca - Politecnico di Milano

Catalogo dei prodotti della ricerca

Promoting data provenance tracking in the archaeological interpretation process

Author: Belussi A
Migliorini S
Quintarelli E
Publication venue: CEUR
Publication date: 01/01/2020
Field of study

n this paper we propose a model and a set of derivation rules for tracking data provenance during the archaeological interpretation process. The interpretation process is the main task performed by an archaeologist that, starting from ground data about evidences and findings, tries to derive knowledge about an ancient object or event. In particular, in this work we concentrate on the dating process used by archaeologists to assign one or more time intervals to a finding in order to define its lifespan on the temporal axis and we propose a framework to represent such information and infer new knowledge including provenance of data. Archaeological data, and in particular their temporal dimension, are typically vague, since many different interpretations can coexist, thus we will use Fuzzy Logic to assign a degree of confidence to values and Fuzzy Temporal Constraint Networks to model relationships between dating of different finding

Catalogo dei prodotti della ricerca

Application of the GeoUML Tools for the Production and Validation of Inspire Datasets

Author: A. Belussi
G. Pelagatti
J. Marca
M. Negri
Publication venue
Publication date: 01/01/2013
Field of study

The structure of INSPIRE datasets is oriented to the exchange of data, not to its storage and manipulation in a database. Therefore data transformation is required. This paper analyses the possibility of using in this context the tools developed by SpatialDBGroup at Politecnico di Milano in order to create and validate spatial databases. The considered scenario is the following one: - an organisation (data provider) is willing to provide WFS and GML conformant to INSPIRE specifications (services and data); - this organisation is hosting geodata related to one or more INSPIRE themes on a spatial relational database, called here Source Database - in order to facilitate the implementation of INSPIRE compliant GML data, the organisation implements a new "INSPIRE-structured" spatial database, called here INSPIRE Database - a Transformation Procedure is created which extracts the data from the Source Database and loads it into the INSPIRE Database - the INSPIRE Database is "validated" also using topological operators, in order to identify also topological constraints gaps. We assume that both the Source Database and the INSPIRE Database are SQL based and that their physical schemas have been generated by the GeoUML Catalogue tool from the corresponding conceptual schemas, called SCSOURCE and SCINSPIRE. In this scenario the availability of the conceptual schemas suggests different areas where the tools can provide a great benefit: 1. Creation of the GeoUML specification SCINSPIRE, automatic generation of the corresponding physical SQL structure and Validation of the INSPIRE Database with respect to the specification 2. (Semi)automatic generation of the Transformation Procedure using a set of correspondence rules between elements of SCSOURCE and SCINSPIRE 3. Automatic generation of the WFS configuration from the SCINSPIRE In this paper we describe the work which has already been done and the research directions which we are following in order to deal with these points

Archivio istituzionale della ricerca - Politecnico di Milano

Catalogo dei prodotti della ricerca

A context-based approach for partitioning big data

Author: Belussi A.
Carra D.
Migliorini S.
Quintarelli E.
Publication venue
Publication date: 01/01/2020
Field of study

In recent years, the amount of available data keeps growing at fast rate, and it is therefore crucial to be able to process them in an efficient way. The level of parallelism in tools such as Hadoop or Spark is determined, among other things, by the partitioning applied to the dataset. A common method is to split the data into chunks considering the number of bytes. While this approach may work well for text-based batch processing, there are a number of cases where the dataset contains structured information, such as the time or the spatial coordinates, and one may be interested in exploiting such a structure to improve the partitioning. This could have an impact on the processing time and increase the overall resource usage efficiency. This paper explores an approach based on the notion of context, such as temporal or spatial information, for partitioning the data. We design a context-based multi-dimensional partitioning technique that divides an n 12dimensional space into splits by considering the distribution of the each contextual dimension in the dataset. We tested our approach on a dataset from a touristic scenario, and our experiments show that we are able to improve the efficiency of the resource usage

Catalogo dei prodotti della ricerca

The blockchain role in ethical data acquisition and provisioning

Author: Belussi A.
Combi C.
Gambini M.
Migliorini S.
Publication venue: country:DEU
Publication date: 01/01/2019
Field of study

The collection of personal data through mobile applications and IoT devices represents the core business of many corporations. From one hand, users are losing control about the property of their data and rarely are conscious about what they are sharing with whom; from the other hand, laws like the European General Data Protection Regulation try to bring data control and ownership back to users. In this paper we discuss the possible impact of the blockchain technology in building independent and resilient data management systems able to ensure data ownership and traceability. The use of this technology could play a major role in creating a transparent global market of aggregated personal data where voluntary acquisition is subject to clear rules and some forms of incentives, making not only the process ethical but also encouraging the sharing of high quality sensitive data

Catalogo dei prodotti della ricerca

What is the role of context in fair group recommendations?

Author: Belussi A.
Carra D.
Migliorini S.
Quintarelli E.
Publication venue: country:DEU
Publication date: 01/01/2019
Field of study

We investigate the role played by the context, i.e. the situation the group is currently experiencing, in the design of a system that recommends sequences of activities as a multi-objective optimization problem, where the satisfaction of the group and the available time interval are two of the functions to be optimized. In particular, we highlight that the dynamic evolution of the group can be the key contextual feature that has to be considered to produce fair suggestions

Catalogo dei prodotti della ricerca

What makes spatial data big? A discussion on how to partition spatial data

Author: Belussi A.
Carra D.
Migliorini S.
Negri M.
Pelagatti G.
Publication venue: place:Dagstuhl
Publication date: 01/01/2018
Field of study

The amount of available spatial data has significantly increased in the last years so that traditional analysis tools have become inappropriate to effectively manage them. Therefore, many attempts have been made in order to define extensions of existing MapReduce tools, such as Hadoop or Spark, with spatial capabilities in terms of data types and algorithms. Such extensions are mainly based on the partitioning techniques implemented for textual data where the dimension is given in terms of the number of occupied bytes. However, spatial data are characterized by other features which describe their dimension, such as the number of vertices or the MBR size of geometries, which greatly affect the performance of operations, like the spatial join, during data analysis. The result is that the use of traditional partitioning techniques prevents to completely exploit the benefit of the parallel execution provided by a MapReduce environment. This paper extensively analyses the problem considering the spatial join operation as use case, performing both a theoretical and an experimental analysis for it. Moreover, it provides a solution based on a different partitioning technique, which splits complex or extensive geometries. Finally, we validate the proposed solution by means of some experiments on synthetic and real datasets

Archivio istituzionale della ricerca - Politecnico di Milano

Catalogo dei prodotti della ricerca

Dagstuhl Research Online Publication Server

A cost model for spatial join operations in SpatialHadoop

Author: A. Belussi
A. Eldawy
S. Migliorini
Publication venue: Dipartimento di Informatica
Publication date: 01/01/2018
Field of study

Spatial join is an important operation in geo-spatial applications, since it is frequently used for performing data analysis involving geographical information. Many efforts have been done in the past decades in order to provide efficient algorithms for spatial join and this is particularly important as the amount of spatial data to be processed increases. In recent years, the MapReduce approach has become a de-facto standard for processing large amount of data (big-data) and some attempts has been made for extending existing frameworks for the processing of spatial data. In this context, SpatialHadoop is an extension of Apache Hadoop, which includes a native support for spatial data, in terms of spatial data types, operations and indexes. In particular, its provides five different variants of spatial join which mainly differ in the use of a spatial index and in the way this index is built and used. In general, none of these algorithm can be considered better than the others, but the choice might depend on the characteristics of the involved datasets. The aim of this work is to deeply analyze the characteristics of these algorithms and to define a cost model for them which is based on some dataset characteristics (i.e., selectivity or spatial properties). The main goal of the proposed cost model is to rank the spatial join implementations by defining a partial order among them using a dominance relation. This cost model has been extensively tested w.r.t. a set of synthetic datasets in order to prove its effectiveness

Catalogo dei prodotti della ricerca